Data at the level of protein groups (from ‘proteinGroups.txt’ file). A total of 4,313 protein groups were observed in at least one of the six samples. Two treatments, no mineral and mineral, with three replicate samples per treatment.

Data Preprocessing

Abundance values were log2 transformed and all non-observed values were assigned a value of NA.

Data Filtering

All potential contaminants and reverse hits were removed. Additionally, any protein groups where 1) the majority protein identifier was comprised of only orthologs or 2) the majority protein identifier has multiple protein groups associated to the organism listed were filtered from the data. Finally, any protein groups with too few observations to conduct a quantitative or qualitative statistical comparison were removed (i.e. at least two observed values per group or at least three observed values in one group). Figure 1 shows the log2 transformed abundance profiles before (left) and after (right) filtering was performed. Filtering did not change the abundance profiles distributions. Table 1 gives the number of protein groups removed at each stage of filtering. The final dataset consisted of 2,808 protein groups.

Table 1: Number of protein groups removed by each filter applied to the data

Filter Number Removed
Contaminants 11
Reverse Hits 61
Orthologs/Double Hits 475
Observation Filter 958

Figure 1: Log2 abundance profiles for each sample before (left) and after (right) filtering

Normalization

SPANS (Webb-Robertson et al. 2011) was run on the data to evaluate potential normalization strategies. Based on these results, data was normalized via median centering. Figure 2 shows the normalized log2 transformed abudance profiles for each sample.

Figure 2: Normalized log2 abundance profiles for each samples

Statistical Analysis

Differential Analyses

A one-way analysis of variance (ANOVA) was run for each protein group to compare mean abundances of samples from the two conditions. Additionally, a G-test (Webb-Robertson et al. 2010) was run to test for differences in presence/absence patterns with a null hypothesis that presence/absence patterns are not related to biological group. Figure 3 shows the number of significant protein groups by direction of expression change for both tests. Figure 4 gives a volcano plot showing the results from the ANOVA analyses.

Figure 3: Number of significant protein groups (p-value \(\leq\) 0.05) by test and direction of change

Figure 4: Volcano plot of ANOVA results. Protein groups with a p-value \(\leq\) 0.05 are colored red

Filtered and normalized data is in the file ‘fusarium_normalized_data.csv’. Statistical results are in the file ‘fusarium_stat_results.csv’. Table 2 gives the names of the columns in the file and a description of the values in each column.

Table 2: Description of columns in statistical results file

Column Description
Majority.protein.IDs from original data output
Protein.IDs from original data output
Peptide.counts..unique from original data output
NObs_NoMineral number of samples from No Mineral treatment with observed abundance
NObs_Mineral number of samples from Mineral treatment with observed abundance
Mean_NoMineral mean normalized log2 abundance for No Mineral samples
Mean_Mineral mean normalized log2 abundance for Mineral samples
pvalue_Gtest_MvsNoM g-test p-value
pvalue_ANOVA_MvsNoM ANOVA p-value comparing mean abundances
Log2FC_MvsNoM Log2 fold-change of group means (M/NoM)
Flag_0.05_ANOVA_MvsNoM Flag indicating direction of quantitative change (0: not sig. different, 1: sig. up expressed in Mineral, -1: sig. down expressed in Mineral)
Flag_0.05_Gtest_MvsNoM Flag indicating direction of qualitative change (0: not sig. different, 1: observed more in Mineral, -1: observed less in Mineral)

Trelliscope Plots

Data are visualized in boxplots of log2 abundance against treatment. All plots are collected into a trelliscope display, which allows you to cycle through all protein groups and filter plots by values such as p-values, fold-changes, and protein names. Many values to filter/sort/show protein groups are available in these displays. These metrics are named similar to those in the flat file statistical results.

  • The ‘Grid’ icon on the left will allow you to change how many plots (features) will be shown at once.
  • The ‘Labels’ button will let you see what metrics and statistics are available for each features and allow you to choose which to display below the figure.
  • The ‘Filter’ button will let you choose a metric/statistic and specify a range on which to filter down the features. For example, one could filter on the range 0 - 0.05 on pvalue_ANOVA to show only protein groups for which the null hypothesis of no difference in mineral and no mineral mean abundances was rejected at a significance level of 0.05.
  • Finally, the ‘Sort’ button will let you sort the plots by a statistic/metric. By default, the plots are sorted by the protein name from the organism of interest. You will click on the ‘x’ inside the blue icon reading ‘Protein’ at the very bottom left in order to remove this default sorting and sort by something else.

Exploratory Data Analysis

Sequential projection pursuit principal component analysis (PCA) was run (Webb-Robertson et al. 2013); this method provides the benefit that missing data does not need to be imputed for the algorithm to run. Figure 5 shows the first two principal component scores for each sample with points colored by group.

Figure 5: Scores for the first two principal components, based on normalized protein group abundance profiles, for each sample with points colored by group

Figures 6 - 8 give a glimpse of the protein group from the organism of interest compared to properties of orthologs mapping to the same protein group. All plots are interactive and points can be toggled on and off in the figure by clicking on the legend markers. Figure 6 shows the number of peptides mapping to the organism of interest for a protein group (x-axis) and the number of peptides from the ortholog with the maximum number of peptides mapping to the same protein group (y-axis); points are colored by direction of change based on ANOVA. Figure 7 gives a similar plot but the total number of peptides mapping to ortholog(s) is given on the y-axis. Finally, Figure 8 is also similar, but the total number of ortholog proteins is on the y-axis.

Figure 6: Number of peptides mapping to organism vs the number of peptides associated with the ortholog with the maximum number of peptides. All points are colored by direction of change based on ANOVA results

Figure 7: Number of peptides mapping to organism vs the total number of peptides associated with all ortholog proteins. All points are colored by direction of change based on ANOVA results

Figure 8: Number of peptides mapping to organism vs the total number of ortholog proteins. All points are colored by direction of change based on ANOVA results

References

Webb-Robertson, Bobbie-Jo M, Melissa M Matzke, Jon M Jacobs, Joel G Pounds, and Katrina M Waters. 2011. “A Statistical Selection Strategy for Normalization Procedures in Lc-Ms Proteomics Experiments Through Dataset-Dependent Ranking of Normalization Scaling Factors.” Proteomics 11 (24): 4736–41.

Webb-Robertson, Bobbie-Jo M, Melissa M Matzke, Thomas O Metz, Jason E McDermott, Hyunjoo Walker, Karin D Rodland, Joel G Pounds, and Katrina M Waters. 2013. “Sequential Projection Pursuit Principal Component Analysis–Dealing with Missing Data Associated with New-Omics Technologies.” Biotechniques 54 (3): 165–68.

Webb-Robertson, Bobbie-Jo M, Lee Ann McCue, Katrina M Waters, Melissa M Matzke, Jon M Jacobs, Thomas O Metz, Susan M Varnum, and Joel G Pounds. 2010. “Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from Ms-Based Proteomics Data.” Journal of Proteome Research 9 (11): 5748–56.